My query time was 80% reduced by this SQL trick
🚪The Issue That Took Up My Time (And Sanity)
Allow me to illustrate the situation for you.
It's just another Tuesday. With my Power BI dashboard open, my Spotify focus playlist playing, and my coffee in hand, I'm waiting, waiting, and still waiting. I had just initiated a query that was taking ages. Once more.
I was working on a dashboard for customer retention, you see. Order history was retrieved, the average number of days since the last purchase was determined, churning users were filtered, and results were shown by region. It ought to have taken a few of seconds. But each time, it took more than a minute.
You can only image the agony if you multiply that by the more than fifteen times I had to test and adjust it every day.
🧠The "Your SQL Logic Might Be the Problem" epiphany
I grumbled to my team, as all data analysts do when they are stuck.
"The date column is already indexed by me."
"It's not even that large of a dataset."
"The BI tool seems slow to me."
At that point, I was given a straightforward question by a senior data engineer:
"Are you executing transformations within joins or aggregations?"I blinked. "Maybe, huh?"
After examining my question, she identified the precise paragraph that was impairing my performance in ten seconds.
🔍My Initial Question (Clean but Expensive)
SELECT
customer_id,
first_name,
last_name,
AVG(DATEDIFF(day, order_date, GETDATE())) AS avg_days_since_order
FROM
orders
JOIN
customers ON orders.customer_id = customers.id
WHERE
status = 'Completed'
GROUP BY
customer_id, first_name, last_name
HAVING
AVG(DATEDIFF(day, order_date, GETDATE())) > 30
</pre>
<pre>Your_code_is_hereBut what's the problem? I was doing DATEDIFF calculations for each row prior to aggregation. and then figuring it out once more in HAVING. On millions of rows, that is twice as much labour.
✨The Solution: Use a CTE Before Proceeding
WITH order_days AS (
SELECT
customer_id,
DATEDIFF(day, order_date, GETDATE()) AS days_since_order
FROM
orders
WHERE
status = 'Completed'
)
SELECT
c.id,
c.first_name,
c.last_name,
AVG(o.days_since_order) AS avg_days_since_order
FROM
order_days o
JOIN
customers c ON o.customer_id = c.id
GROUP BY
c.id, c.first_name, c.last_name
HAVING
AVG(o.days_since_order) > 30🚀The Effect: Ninety seconds to eighteen seconds
This was a productivity unlock, not just a speed bump.
I was able to run, iterate, and publish changes more quickly. Users stopped complaining since the dashboard was quick. The product team even sent me a Slack message that said, "Whatever you did— it's working!"
🧩Why This Is Effective (The Dissection)
- CTEs cut down on unnecessary computation.
You only need to compute DATEDIFF once. - There are fewer rows to process if you filter early.
Later, let SQL handle less work. - Joins get quicker.
particularly if you're not introducing a lot of logic in advance.
💡Examples of Real-World Applications for This Trick
When using SQL views that feed into your reports, use this approach. You'll be grateful to yourself.
✅ ETL Pipelines: Particularly in large fact/dimension models, clean and filter before joining.
✅ Customer Segmentation: Do your maths in advance before figuring out metrics like "days since last order" or "total sessions in 30 days"!
🔧Bonus: Resources to Customise and Enhance Your Searches
- SQL Server: Plan of Execution (CTRL + M)
- PostgreSQL: Describe the analysis
- BigQuery: Specifics of Query Execution
- Snowflake: tab for the Query Profile
🧵The Lesson's Backstory
I didn't realise that every analyst that does a query is actually a DBA on the side.
Additionally, you aren't doing data analysis if your query takes 90 seconds to execute.
You're waiting for information.
🏁 Final Takeaway:
- Transfer complex computations to a CTE.
- Pre-aggregate whatever you can.
- Filter early rather than late.
You can pay me coffee ☕ if you think this is insightful.
Join the conversation